-
Notifications
You must be signed in to change notification settings - Fork 963
feat(librarian): add documentation discovery workflow for targeted doc investigation #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…c investigation - Add Phase 0.5 (Documentation Discovery) before TYPE A and D requests - Sequential flow: websearch → version check → sitemap → targeted investigation - Enables version-specific documentation lookup when user specifies version - Sitemap discovery helps understand doc structure before searching - Update tool reference with sitemap and doc page fetching - Add failure recovery for sitemap/versioned docs not found cases
|
All contributors have signed the CLA. Thank you! ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
Greptile SummaryAdds Phase 0.5 Documentation Discovery workflow to improve documentation lookup accuracy before TYPE A (Conceptual) and TYPE D (Comprehensive) requests. The new workflow discovers official docs URL, verifies versioned documentation, fetches sitemap.xml to understand doc structure, then performs targeted investigation instead of random searching.
Issue Found: Changed tool name from Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Librarian
participant WebSearch as websearch_exa
participant Sitemap as webfetch(sitemap)
participant Docs as webfetch(doc_pages)
participant Context7
participant GrepApp as grep_app
Note over User,GrepApp: TYPE A or TYPE D Request
User->>Librarian: "How do I use React 18?"
rect rgb(230, 240, 255)
Note right of Librarian: Phase 0.5: Documentation Discovery (SEQUENTIAL)
Librarian->>WebSearch: "React official documentation site"
WebSearch-->>Librarian: https://react.dev
Librarian->>WebSearch: "React v18 documentation"
WebSearch-->>Librarian: Versioned URL confirmed
Librarian->>Sitemap: GET /sitemap.xml
Sitemap-->>Librarian: Parse doc structure
Note right of Librarian: Identify relevant sections from sitemap
end
rect rgb(240, 255, 240)
Note right of Librarian: Phase 1: Main Investigation (PARALLEL)
par Parallel Execution
Librarian->>Context7: resolve-library-id("react")
Context7-->>Librarian: library_id
Librarian->>Context7: query-docs(id, "hooks")
Context7-->>Librarian: Official docs
and
Librarian->>Docs: GET /docs/hooks.html (from sitemap)
Docs-->>Librarian: Targeted doc page
and
Librarian->>GrepApp: searchGitHub("React hooks usage")
GrepApp-->>Librarian: Code examples
end
end
Librarian->>User: Synthesized answer with permalinks
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (3)
-
src/agents/librarian.ts, line 114 (link)syntax: tool name changed from
context7_get-library-docstocontext7_query-docsbutsrc/hooks/agent-usage-reminder/constants.ts:18still references the old name -
src/agents/librarian.ts, line 181 (link)syntax: tool name changed from
context7_get-library-docstocontext7_query-docsbutsrc/hooks/agent-usage-reminder/constants.ts:18still references the old name -
src/agents/librarian.ts, line 237 (link)syntax: tool name changed from
context7_get-library-docstocontext7_query-docsbutsrc/hooks/agent-usage-reminder/constants.ts:18still references the old name
1 file reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 issues found across 1 file
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/agents/librarian.ts">
<violation number="1" location="src/agents/librarian.ts:53">
P2: TYPE A table says `context7 + websearch_exa`, but TYPE A instructions now use `webfetch(...)` from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.</violation>
<violation number="2" location="src/agents/librarian.ts:76">
P3: The `webfetch(... + "/versions" or "/v{version}")` example is ambiguous/invalid pseudo-code. Split into explicit alternatives like you do for sitemap fallbacks.</violation>
<violation number="3" location="src/agents/librarian.ts:96">
P1: `context7_query-docs` appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references `context7_get-library-docs`). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.</violation>
<violation number="4" location="src/agents/librarian.ts:114">
P2: Context7 call examples are inconsistent (`query:` vs `topic:`). Pick one argument shape and use it everywhere to avoid invalid tool-call schemas.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Fix ambiguous webfetch pseudo-code (split /versions or /v{version})
- Unify context7_query-docs parameter format (query: instead of topic:)
- Update agent-usage-reminder constants to use context7_query-docs
|
@sisyphus-dev-ai review this And how do you think this will change the librarian's behavior be like? Will it slower? Preciser? More possiblities to answer deprecated stuff? ulw to share your opinion |
Review: Documentation Discovery Workflow for LibrarianGreat question @code-yeongyu! Let me break down the behavioral impact of this change. TL;DR
Detailed AnalysisSpeed Trade-offBefore: Immediate parallel fire (3+ calls for TYPE A, 6+ for TYPE D) After: Sequential discovery → then parallel fire This adds ~2-4 seconds latency on TYPE A/D requests. The main investigation phase remains parallel. Precision Improvement (The Real Win)The current librarian does "hopeful parallel searching" - it fires multiple tools simultaneously without knowing WHERE the relevant documentation lives. This leads to:
The new flow:
This is like the difference between grepping blindly vs. understanding the codebase structure first. Deprecated Content (Counter-intuitive!)You asked if this increases deprecated content risk. It's actually the opposite. The new Phase 0.5 explicitly:
Before this change, the librarian had zero version awareness. It would happily return Next.js 13 App Router docs when the user was asking about Next.js 14. Trade-off Summary
VerdictThis is a good trade-off. Documentation questions benefit more from precision than raw speed. Users asking "How do I use X in version Y?" care about getting the right answer, not getting any answer 2 seconds faster. The sequential discovery is a one-time upfront cost that dramatically improves the quality of the parallel phase that follows. Minor SuggestionConsider adding a skip condition for well-known libraries with consistent doc structures (React, Next.js, etc.) where we already know the sitemap location. This could recover some latency for common cases. But this is a nice-to-have, not a blocker. LGTM - The precision improvement justifies the latency cost. |
|
GMGM any updates here? |
src/agents/librarian.ts
Outdated
| | TYPE B (Implementation) | 2-3 | | ||
| | TYPE C (Context) | 2-3 | | ||
| | TYPE D (Comprehensive) | 3-5 | | ||
| | Request Type | Minimum Parallel Calls | Doc Discovery Required | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should not be changed imo
| "webfetch", | ||
| "context7_resolve-library-id", | ||
| "context7_query-docs", | ||
| "websearch_exa_web_search_exa", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tool name has changed to websearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 issues found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/agents/librarian.ts">
<violation number="1" location="src/agents/librarian.ts:270">
P2: Markdown table separator row is missing the third column separator. This will cause the table to render incorrectly.</violation>
<violation number="2" location="src/agents/librarian.ts:272">
P2: Missing pipe separator between '2-3' and 'NO'. This row should have three columns.</violation>
<violation number="3" location="src/agents/librarian.ts:273">
P2: Missing pipe separator between '2-3' and 'NO'. This row should have three columns.</violation>
<violation number="4" location="src/agents/librarian.ts:275">
P1: Orphan/incomplete table row that appears to be leftover from editing. This line should be removed.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| | TYPE B (Implementation) | 2-3 NO | | ||
| | TYPE C (Context) | 2-3 NO | | ||
| | TYPE D (Comprehensive) | 3-5 | YES (Phase 0.5 first) | | ||
| | Request Type | Minimum Parallel Calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Orphan/incomplete table row that appears to be leftover from editing. This line should be removed.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 275:
<comment>Orphan/incomplete table row that appears to be leftover from editing. This line should be removed.</comment>
<file context>
@@ -296,16 +264,15 @@ Use OS-appropriate temp directory:
+| TYPE B (Implementation) | 2-3 NO |
+| TYPE C (Context) | 2-3 NO |
+| TYPE D (Comprehensive) | 3-5 | YES (Phase 0.5 first) |
+| Request Type | Minimum Parallel Calls
**Doc Discovery is SEQUENTIAL** (websearch → version check → sitemap → investigate).
</file context>
| | TYPE D (Comprehensive) | 3-5 | | ||
| | TYPE A (Conceptual) | 1-2 | YES (Phase 0.5 first) | | ||
| | TYPE B (Implementation) | 2-3 NO | | ||
| | TYPE C (Context) | 2-3 NO | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Missing pipe separator between '2-3' and 'NO'. This row should have three columns.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 273:
<comment>Missing pipe separator between '2-3' and 'NO'. This row should have three columns.</comment>
<file context>
@@ -296,16 +264,15 @@ Use OS-appropriate temp directory:
+|--------------|----------------|
+| TYPE A (Conceptual) | 1-2 | YES (Phase 0.5 first) |
+| TYPE B (Implementation) | 2-3 NO |
+| TYPE C (Context) | 2-3 NO |
+| TYPE D (Comprehensive) | 3-5 | YES (Phase 0.5 first) |
+| Request Type | Minimum Parallel Calls
</file context>
| | TYPE C (Context) | 2-3 NO | | |
| | TYPE C (Context) | 2-3 | NO | |
| | TYPE C (Context) | 2-3 | | ||
| | TYPE D (Comprehensive) | 3-5 | | ||
| | TYPE A (Conceptual) | 1-2 | YES (Phase 0.5 first) | | ||
| | TYPE B (Implementation) | 2-3 NO | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Missing pipe separator between '2-3' and 'NO'. This row should have three columns.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 272:
<comment>Missing pipe separator between '2-3' and 'NO'. This row should have three columns.</comment>
<file context>
@@ -296,16 +264,15 @@ Use OS-appropriate temp directory:
+| Request Type | Suggested Calls | Doc Discovery Required |
+|--------------|----------------|
+| TYPE A (Conceptual) | 1-2 | YES (Phase 0.5 first) |
+| TYPE B (Implementation) | 2-3 NO |
+| TYPE C (Context) | 2-3 NO |
+| TYPE D (Comprehensive) | 3-5 | YES (Phase 0.5 first) |
</file context>
| | TYPE B (Implementation) | 2-3 NO | | |
| | TYPE B (Implementation) | 2-3 | NO | |
| | Request Type | Suggested Calls | | ||
| | Request Type | Suggested Calls | Doc Discovery Required | | ||
| |--------------|----------------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Markdown table separator row is missing the third column separator. This will cause the table to render incorrectly.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 270:
<comment>Markdown table separator row is missing the third column separator. This will cause the table to render incorrectly.</comment>
<file context>
@@ -296,16 +264,15 @@ Use OS-appropriate temp directory:
-| TYPE C (Context) | 4+ | NO |
-| TYPE D (Comprehensive) | 6+ | YES (Phase 0.5 first) |
+| Request Type | Suggested Calls | Doc Discovery Required |
+|--------------|----------------|
+| TYPE A (Conceptual) | 1-2 | YES (Phase 0.5 first) |
+| TYPE B (Implementation) | 2-3 NO |
</file context>
| |--------------|----------------| | |
| |--------------|----------------|------------------------| |
Summary
Changes
New Documentation Discovery Flow (Phase 0.5)
Why This Matters
Previously, the librarian would immediately fire parallel searches without understanding the documentation structure. This led to:
Now the flow is:
Updated Components
Summary by cubic
Adds a Documentation Discovery phase before TYPE A and D requests to locate official, versioned docs and fetch targeted pages. Improves accuracy and cuts down random, unfocused searches.
New Features
Refactors
Written for commit 4d8b808. Summary will update on new commits.